Exome sequencing, also known as whole exome sequencing ( WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subset of DNA that encodes . These regions are known as —humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million . The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.
The goal of this approach is to identify genetic variants that alter protein sequences, and to do this at a much lower cost than whole-genome sequencing. Since these variants can be responsible for both Mendelian and common polygene diseases, such as Alzheimer's disease, whole exome sequencing has been applied both in academic research and as a clinical diagnostic.
In the past, clinical genetic tests were chosen based on the clinical presentation of the patient (i.e. focused on one gene or a small number known to be associated with a particular syndrome), or surveyed only certain types of variation (e.g. comparative genomic hybridization) but provided definitive genetic diagnoses in fewer than half of all patients. Exome sequencing is now increasingly used to complement these other tests: both to find mutations in genes already known to cause disease as well as to identify novel genes by comparing exomes from patients with similar features.
Though many techniques have been described for targeted capture, only a few of these have been extended to capture entire exomes. The first target enrichment strategy to be applied to whole exome sequencing was the array-based hybrid capture method in 2007, but in-solution capture has gained popularity in recent years.
Roche NimbleGen was first to take the original DGS technology and adapt it for next-generation sequencing. They developed the Sequence Capture Human Exome 2.1M Array to capture ~180,000 coding exons. This method is both time-saving and cost-effective compared to PCR based methods. The Agilent Capture Array and the comparative genomic hybridization array are other methods that can be used for hybrid capture of target sequences. Limitations in this technique include the need for expensive hardware as well as a relatively large amount of DNA.
This method was developed to improve on the hybridization capture target-enrichment method. In solution capture (as opposed to hybrid capture) there is an excess of probes to target regions of interest over the amount of template required. The optimal target size is about 3.5 megabases and yields excellent sequence coverage of the target regions. The preferred method is dependent on several factors including: number of base pairs in the region of interest, demands for reads on target, equipment in house, etc.
Although exome sequencing is more expensive than hybridization-based technologies on a per-sample basis, its cost has been decreasing due to the falling cost and increased throughput of whole genome sequencing.
False positive and false negative findings are associated with genomic resequencing approaches and are critical issues. A few strategies have been developed to improve the quality of exome data such as:
Rare recessive disorders may not have single nucleotide polymorphisms (SNPs) in public databases such as dbSNP. More common recessive phenotypes would be more likely to have disease-causing variants reported in dbSNP. For example, the most common cystic fibrosis variant has an allele frequency of about 3% in most populations. Screening out such variants might erroneously exclude such genes from consideration. Genes for recessive disorders are usually easier to identify than dominant disorders because the genes are less likely to have more than one rare nonsynonymous variant. The system that screens common genetic variants relies on dbSNP which may not have accurate information about the variation of alleles. Using lists of common variation from a study exome or genome-wide sequenced individual would be more reliable. A challenge in this approach is that as the number of exomes sequenced increases, dbSNP will also increase in the number of uncommon variants. It will be necessary to develop thresholds to define the common variants that are unlikely to be associated with a disease phenotype.
Genetic heterogeneity and population ethnicity are also major limitations as they may increase the number of false positive and false negative findings which will make the identification of candidate genes more difficult. Of course, it is possible to reduce the stringency of the thresholds in the presence of heterogeneity and ethnicity, however this will reduce the power to detect variants as well. Using a genotype-first approach to identify candidate genes might also offer a solution to overcome these limitations.
Unlike common variant analysis, the analysis of rare variants in whole-exome sequencing studies evaluates variant sets rather than single variants. SNP annotation predict the effect or function of rare variants and help prioritize rare functional variants. Incorporating these annotations can effectively boost the power of genetic association of rare variants analysis of whole genome sequencing studies. Some methods and tools have been developed to perform functionally-informed rare variant association analysis by incorporating functional annotations to empower analysis in whole exome sequencing studies.
Exome sequencing in rare variant gene discovery remains a very active and ongoing area of research, and there is growing evidence that a significant burden of risk is observed across sets of genes. The exome sequencing has been reported rare variants in KRT82 gene in the autoimmune disorder Alopecia Areata.
Subsequently, another group reported successful clinical diagnosis of a suspected Bartter syndrome patient of Turkish origin. Bartter syndrome is a renal salt-wasting disease. Exome sequencing revealed an unexpected well-conserved recessive mutation in a gene called SLC26A3 which is associated with congenital chloride diarrhea (CLD). This molecular diagnosis of CLD was confirmed by the referring clinician. This example provided proof of concept of the use of whole-exome sequencing as a clinical tool in evaluation of patients with undiagnosed genetic illnesses. This report is regarded as the first application of next generation sequencing technology for molecular diagnosis of a patient.
A second report was conducted on exome sequencing of individuals with a mendelian disorder known as Miller syndrome (MIM#263750), a rare disorder of autosomal recessive inheritance. Two siblings and two unrelated individuals with Miller syndrome were studied. They looked at variants that have the potential to be pathogenic such as non-synonymous mutations, splice acceptor and donor sites and short coding insertions or deletions. Since Miller syndrome is a rare disorder, it is expected that the causal variant has not been previously identified. Previous exome sequencing studies of common single nucleotide polymorphisms (SNPs) in public SNP databases were used to further exclude candidate genes. After exclusion of these genes, the authors found mutations in DHODH that were shared among individuals with Miller syndrome. Each individual with Miller syndrome was a compound heterozygote for the DHODH mutations which were inherited as each parent of an affected individual was found to be a carrier.
This was the first time exome sequencing was shown to identify a novel gene responsible for a rare mendelian disease. This exciting finding demonstrates that exome sequencing has the potential to locate causative genes in complex diseases, which previously has not been possible due to limitations in traditional methods. Targeted capture and massively parallel sequencing represents a cost-effective, reproducible and robust strategy with high sensitivity and specificity to detect variants causing protein-coding changes in individual human genomes.
Having diagnosed a genetic cause of a disease, this information may guide the selection of appropriate treatment. The first time this strategy was performed successfully in the clinic was in the treatment of an infant with inflammatory bowel disease. A number of conventional diagnostics had previously been used, but the results could not explain the infant's symptoms. Analysis of exome sequencing data identified a mutation in the XIAP gene. Knowledge of this gene's function guided the infant's treatment, leading to a bone marrow transplantation which cured the child of disease.
Researchers have used exome sequencing to identify the underlying mutation for a patient with Bartter Syndrome and congenital chloride diarrhea. Bilgular's group also used exome sequencing and identified the underlying mutation for a patient with severe brain malformations, stating "These highlight the use of whole exome sequencing to identify disease loci in settings in which traditional methods have proved challenging... Our results demonstrate that this technology will be particularly valuable for gene discovery in those conditions in which mapping has been confounded by locus heterogeneity and uncertainty about the boundaries of diagnostic classification, pointing to a bright future for its broad application to medicine".
Researchers at University of Cape Town, South Africa used exome sequencing to discover the genetic mutation of CDH2 as the underlying cause of a genetic disorder known as arrhythmogenic right ventricle cardiomyopathy (ARVC)‚ which increases the risk of heart disease and cardiac arrest. [1]
In November 2012, DNADTC, a division of Gene by Gene started offering exomes at 80X coverage and introductory price of $695. This price per DNADTC web site is currently $895. In October 2013, BGI announced a promotion for personal whole exome sequencing at 50X coverage for $499. In June 2016 Genos was able to achieve an even lower price of $399 with a CLIA-certified 75X consumer exome sequenced from saliva.
A 2018 review of 36 studies found the cost for exome sequencing to range from $555USD to $5,169USD, with a diagnostic yield ranging from 3% to 79% depending on patient groups.
|
|